-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a crash that happens when Firestore is being terminated when the last shared pointer to it is in a listener #7412
Conversation
@wilhuff Any suggestions on different approaches/potential pitfalls/etc. are very welcome. |
@wilhuff Looks like extending the lifetime of |
Hmm, I'm getting an Asan error in CI. I'll investigate and let you know when this PR is ready for another look. |
Ok, it's ready for review again. The simpler approach doesn't work, and the previous approach needed a fix. |
@@ -86,6 +86,7 @@ void AsyncQueue::ExecuteBlocking(const Operation& operation) { | |||
"ExecuteBlocking may not be called " | |||
"before the previous operation finishes executing"); | |||
|
|||
auto self = shared_from_this(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we haven't done this before? is_operation_in_progress_ = false;
may crash when api::Firestore
(and, consequently, the worker queue) is destroyed on the queue (in FirestoreClient::TerminateAsync
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an oversight.
I think the idea at the time was that FirestoreClient would keep the AsyncQueue alive as long as it needed to, but this doesn't account for the case where the Firestore destructor is running on the AsyncQueue.
worker_queue_->EnqueueEvenWhileRestricted([&, this, callback] { | ||
// Make sure this `FirestoreClient` is not destroyed during | ||
// `TerminateInternal`, so that `user_executor_` stays valid. | ||
auto self = shared_from_this(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're depending on the shared_ptr
keeping the instance alive, we should create it outside the callback. Otherwise, this
can be destroyed while this lambda is awaiting execution.
It seems like both callbacks that call TerminateInternal
should capture a shared_ptr<FirestoreClient>
to ensure the instance lives past that point.
@@ -223,6 +225,9 @@ class FirestoreClient : public std::enable_shared_from_this<FirestoreClient> { | |||
bool credentials_initialized_ = false; | |||
local::LruDelegate* _Nullable lru_delegate_; | |||
util::DelayedOperation lru_callback_; | |||
|
|||
// Used during shutdown to guarantee lifetimes. | |||
std::weak_ptr<api::Firestore> weak_firestore_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type here doesn't actually matter since we don't need to manipulate this instance. Could this be std::weak_ptr<void>
?
TerminateInternal(); | ||
}); | ||
bool enqueued = false; | ||
// If `remote_store_` is null, it means termination has finished already. In |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reference to remote_store_
in the code below, and remote_store_
isn't actually a good signal anyway, since it is reset asynchronously. I think you're referring to is_terminated()
, right?
Second, phrasing this in terms of "the sequential order invariant" obscures what's going on. I think the issue is that EnqueueEvenWhileRestricted
can't be called while already on the async queue. Restated that way it seems like we should just fix that rather than trying to introduce yet another guard on top of that.
@@ -263,7 +274,13 @@ void FirestoreClient::Dispose() { | |||
|
|||
void FirestoreClient::TerminateAsync(StatusCallback callback) { | |||
worker_queue_->EnterRestrictedMode(); | |||
worker_queue_->EnqueueEvenWhileRestricted([this, callback] { | |||
worker_queue_->EnqueueEvenWhileRestricted([&, this, callback] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default capture by reference in an asynchronously executing block is a recipe for disaster. It doesn't even seem like it's necessary. Remove?
// dangerous in the case when `Dispose` is invoked immediately after the | ||
// termination, still on the worker queue -- it would break the sequential | ||
// order invariant of the queue. | ||
if (!is_terminated()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is racy. Two threads could observe !is_terminated()
simultaneously and then both attempt the block below.
A better solution might be to have EnqueueEvenWhileRestricted
do the right thing. This already has logic to deal with the race inherent in two callers attempting to dispose the instance. Perhaps we just need to make it EnqueueRelaxedEvenWhileRestricted
?
@@ -86,6 +86,7 @@ void AsyncQueue::ExecuteBlocking(const Operation& operation) { | |||
"ExecuteBlocking may not be called " | |||
"before the previous operation finishes executing"); | |||
|
|||
auto self = shared_from_this(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an oversight.
I think the idea at the time was that FirestoreClient would keep the AsyncQueue alive as long as it needed to, but this doesn't account for the case where the Firestore destructor is running on the AsyncQueue.
Closing in favor of #7421. |
…last shared pointer to it is in a listener (#7421) If a user calls terminate and immediately disposes their reference to Firestore, it's possible that while termination is in process, the last remaining reference (more precisely, shared pointer) to Firestore is in a listener. When that listener is destroyed as part of the termination, it leads to `api::Firestore` being destroyed. Importantly, termination happens on the worker queue. The destructor of `api::Firestore` calls `Dispose` which presumes it's not called from the worker queue and tries to enqueue work, leading to a failing assertion and a crash. The solution is simply to remove the sequential order checks when the queue enters/is in restricted mode. There is a legitimate case where the Firestore destructor can run on the worker queue, as the original issue shows. The complexity of #7412 also indicates that a simpler solution is preferable. Fixes #6909.
If a user calls terminate and immediately disposes their reference to Firestore, it's possible that while termination is in process, the last remaining reference (more precisely, shared pointer) to Firestore is in a listener. When that listener is destroyed as part of the termination, it leads to
api::Firestore
being destroyed. Importantly, termination happens on the worker queue. The destructor ofapi::Firestore
callsDispose
which presumes it's not called from the worker queue and tries to enqueue work, leading to a failing assertion and a crash.The general problem is that
FirestoreClient::Dispose
may be invoked on the worker queue fromTerminateInternal
and needs a way to handle it gracefully. This PR largely makesDispose
a no-op in that case.Fixes #6909.